Multi-objective Reinforcement Learning through Continuous Pareto Manifold Approximation

نویسندگان

  • Simone Parisi
  • Matteo Pirotta
  • Marcello Restelli
چکیده

Many real-world control applications, from economics to robotics, are characterized by the presence of multiple conflicting objectives. In these problems, the standard concept of optimality is replaced by Pareto–optimality and the goal is to find the Pareto frontier, a set of solutions representing different compromises among the objectives. Despite recent advances in multi–objective optimization, achieving an accurate representation of the Pareto frontier is still an important challenge. In this paper, we propose a reinforcement learning policy gradient approach to learn a continuous approximation of the Pareto frontier in multi–objective Markov Decision Problems (MOMDPs). Differently from previous policy gradient algorithms, where n optimization routines are executed to have n solutions, our approach performs a single gradient ascent run, generating at each step an improved continuous approximation of the Pareto frontier. The idea is to optimize the parameters of a function defining a manifold in the policy parameters space, so that the corresponding image in the objectives space gets as close as possible to the true Pareto frontier. Besides deriving how to compute and estimate such gradient, we will also discuss the non–trivial issue of defining a metric to assess the quality of the candidate Pareto frontiers. Finally, the properties of the proposed approach are empirically evaluated on two problems, a linear-quadratic Gaussian regulator and a water reservoir control task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-objective Reinforcement Learning with Continuous Pareto Frontier Approximation Supplementary Material

This paper is about learning a continuous approximation of the Pareto frontier in Multi–Objective Markov Decision Problems (MOMDPs). We propose a policy–based approach that exploits gradient information to generate solutions close to the Pareto ones. Differently from previous policy–gradient multi–objective algorithms, where n optimization routines are use to have n solutions, our approach perf...

متن کامل

Multi-Objective Reinforcement Learning with Continuous Pareto Frontier Approximation

This paper is about learning a continuous approximation of the Pareto frontier in Multi–Objective Markov Decision Problems (MOMDPs). We propose a policy–based approach that exploits gradient information to generate solutions close to the Pareto ones. Differently from previous policy–gradient multi–objective algorithms, where n optimization routines are used to have n solutions, our approach per...

متن کامل

Manifold-based multi-objective policy search with sample reuse

Many real-world applications are characterized by multiple conflicting objectives. In such problems optimality is replaced by Pareto optimality and the goal is to find the Pareto frontier, a set of solutions representing different compromises among the objectives. Despite recent advances in multi-objective optimization, achieving an accurate representation of the Pareto frontier is still an imp...

متن کامل

A multiobjective reinforcement learning approach to water resources systems operation: Pareto frontier approximation in a single run

[1] The operation of large-scale water resources systems often involves several conflicting and noncommensurable objectives. The full characterization of tradeoffs among them is a necessary step to inform and support decisions in the absence of a unique optimal solution. In this context, the common approach is to consider many single objective problems, resulting from different combinations of ...

متن کامل

A general framework for evolutionary multiobjective optimization via manifold learning

Under certain mild condition, the Pareto-optimal set (PS) of a continuous multiobjective optimization problem, with m objectives, is a piece-wise continuous (m 1)-dimensional manifold. This regularity property is important, yet has been unfortunately ignored in many evolutionary multiobjective optimization (EMO) studies. The first work that explicitly takes advantages of this regularity propert...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Artif. Intell. Res.

دوره 57  شماره 

صفحات  -

تاریخ انتشار 2016